13 research outputs found

    Privacy-Aware and Secure Decentralized Air Quality Monitoring

    Get PDF
    Indoor Air Quality monitoring is a major asset to improving quality of life and building management. Today, the evolution of embedded technologies allows the implementation of such monitoring on the edge of the network. However, several concerns need to be addressed related to data security and privacy, routing and sink placement optimization, protection from external monitoring, and distributed data mining. In this paper, we describe an integrated framework that features distributed storage, blockchain-based Role-based Access Control, onion routing, routing and sink placement optimization, and distributed data mining to answer these concerns. We describe the organization of our contribution and show its relevance with simulations and experiments over a set of use cases

    Using words from daily news headlines to predict the movement of stock market indices

    Get PDF
    Stock market analysis is one of the biggest areas of interest for text mining. Many researchers proposed different approaches that use text information for predicting the movement of stock market indices. Many of these approaches focus either on maximising the predictive accuracy of the model or on devising alternative methods for model evaluation. In this paper, we propose a more descriptive approach focusing on the models themselves, trying to identify the individual words in the text that most affect the movement of stock market indices. We use data from two sources (for the past eight years): the daily data for the Dow Jones Industrial Average index (‘open’ and ‘close’ values for each trading day) and the headlines of the most voted 25 news on the Reddit World News Channel for the previous ‘trading days.’ By applying machine learning algorithms on these data and analysing individual words that appear in the final predictive models, we find that the words gay, propaganda and massacre are typically associated with a daily increase of the stock index, while the word IRAN mostly coincide with its decrease. While this work presents a first step towards qualitative analysis of stock market models, there is still plenty of room for improvements

    Using rule learning for subgroup discovery

    Get PDF
    This dissertation investigates how to adapt standard classification rule learning approaches to subgroup discovery. The goal of subgroup discovery is to find rules describing subsets of a selected population that are sufficiently large and statistically unusual in terms of class distribution. The dissertation presents a subgroup discovery algorithm, CN2-SD, developed by modifying parts of the CN2 classification rule learner: its covering algorithm, search heuristic, probabilistic classification of instances, and evaluation measures. Experimental evaluation of CN2-SD on selected data sets shows substantial reduction of the number of induced rules, increased rule coverage, rule significance and overall coverage of the target concept as well as slight improvements in terms of the area under ROC curve, when compared with rule learning algorithms CN2 and RIPPER. An application of CN2-SD to a large traffic accident data set confirms these findings. This dissertation presents also the subgroup discovery algorithm APRIORI-SD, developed by adapting association rule learning to subgroup discovery. This was achieved by building a classification rule learner APRIORI-C, enhanced with a novel post–processing mechanism, a new quality measure for induced rules (weighted relative accuracy) and using probabilistic classification of instances. Experimental results a similar behavior of APRIORI-SD and the subgroup discovery algorithm CN2-SD i.e. substantial reduction of the number of induced rules, increased rule coverage, rule significance and overall coverage of the target concept as well as slight improvements in terms of the area under ROC curve, when compared with rule learning algorithms CN2, RIPPER and APRIORI-C. A new optimization approach to subgroup discovery based on ROC analysis is also presented and implemented as an adaptation of the APRIORI-SD algorithm. The implications of the “number-of-rules–unusualness–coverage” trade off to subgroup discovery are investigated through an experimental evaluation of the adapted APRIORI-SD algorithm on selected data sets. The results are presented in the form of 2D graphs depicting the dependencies between the number of induced rules, unusualness, accuracy and overall coverage of the target concept and the original APRIORI-SD subgroup discovery algorithm is discussed in this new optimization framework. Finally, the dissertation presents the comparison of the new algorithms with existing state–of–the–art subgroup discovery algorithms and the application of CN2-SD and APRIORI-SD to a real–life problem – the traffic accident database – a database describing traffic accidents in Great Britain

    Coverage-based classification using association rule mining

    No full text
    Building accurate and compact classifiers in real-world applications is one of the crucial tasks in data mining nowadays. In this paper, we propose a new method that can reduce the number of class association rules produced by classical class association rule classifiers, while maintaining an accurate classification model that is comparable to the ones generated by state-of-the-art classification algorithms. More precisely, we propose a new associative classifier that selects “strong” class association rules based on overall coverage of the learning set. The advantage of the proposed classifier is that it generates significantly smaller rules on bigger datasets compared to traditional classifiers while maintaining the classification accuracy. We also discuss how the overall coverage of such classifiers affects their classification accuracy. Performed experiments measuring classification accuracy, number of classification rules and other relevance measures such as precision, recall and f-measure on 12 real-life datasets from the UCI ML repository (Dua, D.; Graff, C. UCI Machine Learning Repository. Irvine, CA: University of California, 2019) show that our method was comparable to 8 other well-known rule-based classification algorithms. It achieved the second-highest average accuracy (84.9%) and the best result in terms of average number of rules among all classification methods. Although not achieving the best results in terms of classification accuracy, our method proved to be producing compact and understandable classifiers by exhaustively searching the entire example space

    Privacy-Preserving Data Mining on Blockchain-Based WSNs

    No full text
    Currently, the computational power present in the sensors forming a wireless sensor network (WSN) allows for implementing most of the data processing and analysis directly on the sensors in a decentralized way. This shift in paradigm introduces a shift in the privacy and security problems that need to be addressed. While a decentralized implementation avoids the single point of failure problem that typically applies to centralized approaches, it is subject to other threats, such as external monitoring, and new challenges, such as the complexity of providing decentralized implementations for data mining algorithms. In this paper, we present a solution for privacy-aware distributed data mining on wireless sensor networks. Our solution uses a permissioned blockchain to avoid a single point of failure in the system. Contracts are used to construct an onion-like structure encompassing the Hoeffding trees and a route. The onion-routed query conceals the network identity of the sensors from external adversaries, and obfuscates the actual computation to hide it from internally compromised nodes. We validate our solution on a use case related to an air quality-monitoring sensor network. We compare the quality of our model against traditional models to support the feasibility and viability of the solution

    Privacy-Preserving Data Mining on Blockchain-Based WSNs

    No full text
    Currently, the computational power present in the sensors forming a wireless sensor network (WSN) allows for implementing most of the data processing and analysis directly on the sensors in a decentralized way. This shift in paradigm introduces a shift in the privacy and security problems that need to be addressed. While a decentralized implementation avoids the single point of failure problem that typically applies to centralized approaches, it is subject to other threats, such as external monitoring, and new challenges, such as the complexity of providing decentralized implementations for data mining algorithms. In this paper, we present a solution for privacy-aware distributed data mining on wireless sensor networks. Our solution uses a permissioned blockchain to avoid a single point of failure in the system. Contracts are used to construct an onion-like structure encompassing the Hoeffding trees and a route. The onion-routed query conceals the network identity of the sensors from external adversaries, and obfuscates the actual computation to hide it from internally compromised nodes. We validate our solution on a use case related to an air quality-monitoring sensor network. We compare the quality of our model against traditional models to support the feasibility and viability of the solution
    corecore